Evaluation of extraction and enrichment methods for recovery of respiratory RNA viruses in a metagenomics approach

Viral metagenomics is increasingly applied in viral detection and virome characterization. Different extraction and enrichment techniques may be adopted, however, reports on their effective influence on viral recovery is often conflicting. Using a three step enrichment steps, the effect of three extraction kits and the influence of DNase treatment with or without rRNA removal for respiratory RNA virus recovery from nasopharyngeal swab samples was evaluated. The viral cocktail containing six different RNA viruses pooled in equal volume were subjected to the different extraction and enrichment methods, sequenced using the Illumina MiSeq, and analysed using Genome Detective. The PureLink® Viral RNA/DNA Mini Kit (PureLink) was highly efficient with better recovery of all the viral agents in the cocktail. The use of rRNA treatment resulted in increased viral recovery with PureLink and QIAamp® Viral RNA Mini kit, while having comparable recovery rate as DNase only with the QIAamp® MinElute Virus Spin Kit. The observed low reads and genome coverage of some of the viruses could be attributed to their low abundance. Depending on sample matrix, extraction choice and enrichment strategy may influence recovery of respiratory RNA virus in metagenomics studies, therefore individual evaluation and adoption may be necessary for a robust result.

Despite the revolutionary potential of mNGS in clinical virology, biological samples are ladened with high background genetic material from host and bacteria which may significantly hamper the detection of viral pathogens (Lewandowska et al., 2017;Liu et al., 2020;Zhang et al., 2018). In this regard, depending on sample source, specific strategy for sample preparation and enrichments methods are often required (de Vries et al., 2021;Hall et al., 2014;Rosseel et al., 2015). Furthermore, it may be challenging to ascertain the best applicable extraction and enrichment method for a given sample type for the purpose of detecting a specific virus or group of viruses (Hall et al., 2014). Previous studies have evaluated different extraction platforms and concluded that nucleic material extraction choice may influence viral detection with mNGS (Klenner et al., 2017;Lewandowski et al., 2017;Peng et al., 2020;Sabatier et al., 2020;Wen et al., 2016;Zhang et al., 2018). More so, various viral enrichments methods have been applied to several sample types (Goya et al., 2018;Kohl et al., 2015a;Lewandowska et al., 2017;Liu et al., 2020;O'Flaherty et al., 2018), however due to different sample matrixes and quality of nuclei material, there are still conflicting reports on their impact and on which of them is really necessary. Besides, several studies adopt the use of artificial samples which does not always mirror the complex characteristics of biological samples.
The goal of the study is to optimize mNGS for application in RNA respiratory virus diagnostics and discovery. Therefore, this study is structured on respiratory RNA virus recovery, majority of which are responsible for relevant (re)emerging viral threats with pandemic potential. Using clinical samples [nasopharyngeal swabs (NPS) positive for different RNA viruses in viral transport media (VTM], we assessed the sensitivity of three different extraction methods with a 3-step enrichment method (centrifugation, filtration, and nuclease treatment), and the impact of DNase treatment only versus DNase treatment and ribosomal RNA (rRNA) removal on viral recovery in a mNGS workflow. The detection of targeted viruses and the percentage of pathogen genome coverage obtained were used as the measure of performance.

Clinical virus specimen preparation
A biological reagent, containing six viruses belonging to different RNA viral families were assembled from clinical specimens [NPS in VTM) which tested positive for the different viruses during routine diagnostic testing (Table A1). These known viruses were selected to represent different viral families and genome sizes (ranging from 7 to 29.8 kb). The viral samples were provided by the National Health Laboratory Service, Universitas Bloemfontein, Free State and the National Institute for Communicable Diseases, Sandringham Johannesburg. Samples were received on dry ice and immediately stored at − 80 • C until processing.

Preparation of viral cocktail and confirmation of each virus in the cocktail
A viral cocktail was prepared using 1000 µL of the VTM (BD Diagnostics, Franklin Lakes, NJ) of each viral agent. A 300 µL from the viral cocktail was used to confirm the presence of each viruses using the QIAstat-Dx-Respiratory SARS-CoV-2 Panel (Qiagen, Hilden, Germany) as per the manufacturer's information. The Panel is a 40-cycle automated multiplex real-time PCR (mRT-PCR) test that qualitatively detects and identifies multiple respiratory viral and bacterial nucleic acids (as shown in supplementary table S1) in VTM. The QIAstat-Dx Respiratory SARS-CoV-2 Panel simultaneously detected and generated the Ct values and amplification curves of each of the virus in the cocktail.

Enrichment Prior to Nucleic Acid Extraction
Based upon review of previous enrichment methods (Hall et al., 2014;Rosseel et al., 2015), To enrich for viral RNA, the viral cocktail was centrifuged at 6000× g (8000 rpm) for 5 min, and the supernatant was filtered through a 0.22 µm membrane filter (Sigma-Aldrich, St. Louis, Missouri) to exclude the remaining cellular debris. To remove the free-floating nucleic acids, nuclease treatment was performed using 1X TURBO DNAse buffer (Life Technologies, Carlsbad, CA), 0.1 U µL − 1 RNAse One (Promega, Fitchburg, WI), and 0.1 U µL − 1 TURBO DNAse (Life Technologies, Carlsbad, CA), and thereafter incubated at 37 • C for 90 min.

Extraction of viral nucleic material
Three commercially available kits including QIAamp Viral RNA Mini Kit (QIAamp Mini) (Qiagen, Hilden, Germany), QIAamp MinElute Virus Spin Kit (QIAamp Spin)(Qiagen, Hilden, Germany), and PureLink Viral RNA/DNA Mini Kit (PureLink) (Thermo Fisher Scientific, Waltham, MA) were evaluated for simultaneous isolation of viral RNA. Major characteristics of extraction kits are shown in Table A2. All methods widely used in diagnostic and research laboratories.
Prior to extraction, samples were thawed and homogenized by vortexing. Nucleic acids were extracted as per the manufacturer's instructions, beside the use of carrier RNA and eluted in 50 µL of the AVE buffer or RNase-free water. A no-template negative control (NTC) consisting of RNase free water was implemented per kit to evaluate for cross/kitome contamination during the process.

Reproducibility
To assess reproducibility, workflow was performed in duplicates for each extraction kits (Two DNase only and two DNase and rRNA treatment per kits). Sequencing was also performed in two different runs.

Quantification of the extracted RNA
The extracted RNA yield was quantified using the Qubit assay kit on the Qubit® 3.0 Flurometer (Thermo Fisher Scientific, Waltham, MA) as per manufacturer's instructions. The Qubit® RNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA) is designed to be accurate for RNA sample concentrations between 250 pg/µL and 100 ng/µL.

DNase treatment and RNA purification
DNAse treatment was performed on n = 12 extracted RNA (four for each kit). The DNase treatment was performed in a 50 µL reaction using the TURBO DNA-free™ Kit (Thermo Fisher Scientific, Waltham, MA) as per manufacturer's instructions. DNase treated samples were purified using the RNeasy Plus Mini Kit (Qiagen, Hilden, Germany), as per the manufacturer's instructions.

Library preparation and sequencing
The libraries were prepared using the QIAseq FX Single Cell RNA Library Kit (Qiagen, Hilden, Germany) as per the manufacturer's instructions. The quality assessment of the libraries after preparation was performed on the Agilent 2100 bioanalyzer (Agilent Technologies, Santa Clara, CA). The average fragment size obtained was approximately 450 bp for all the samples. The obtained library size was optimal for the desired read length of 250 bp paired end. The concentration of the libraries after preparation was 4 nM and a final concentration of 8pM was loaded into a V3 MiSeq reagent cartridge (Illumina, San Diego, CA). Sequencing was performed in two independent runs using the Illumina MiSeq platform to generate a paired end read of 2 × 250 bp.

Confirmation of presence of each virus
All the viruses pooled in the cocktail were detected by the QIAstat-Dx Respiratory SARS-CoV-2 Panel at varying Ct values as shown in Table A3. Beside the viruses in the cocktail, no other viral respiratory pathogen targeted by the panel was detected.

Nucleic acid concentration of extracted RNA per kit
The extraction kits exhibited varying recovery rate of the viruses after RNA extraction and cDNA synthesis (Table A4). The PureLink exhibited the highest recovery rate. The included negative control was also undetectable in all the extraction kits after RNA extraction.

Sequence data overview
The first sequence run generated a quality score (Q-score) of more than 30 for 80% samples, with a cluster density of 1014 K/mm2 and 98.1% of clusters passing filter. While the second sequence run generated a Q-score of more than 30 for 76% samples with a cluster density of 1214 K/mm2 and 96.5% clusters passing filter rate. The corresponding total reads and viral reads per extraction kit and enrichment methods are shown in Table A5.

Sensitivity of the extraction kits for the detection of targeted viruses
With total reads of 71496, PureLink detected all the six targeted viruses. The highest genome coverages were for SARS-CoV-2 (100 %) and HRV A (94.4 %), while the lowest was for IVA (6 %) ( Table A6). The QIAamp Mini and QIAamp Spin with a total 9331 and 16232 reads, respectively, for the targeted viruses, detected five of the six viral pathogens in the cocktail (RSV, HRV A, and SARS-CoV-2, HMPV, and HPIV-3), with no IVA detected. The highest genome coverage by the QIAamp Mini was for HRVA (91 %) while the highest genome coverage for the QIAamp Spin was for SARS-CoV-2 (88.9 %) (Table A6).

DNase treatment vs DNase and rRNA treatment on PureLink
The use of DNase only with the PureLink resulted in detection of two of the six targeted viruses (Table A6), a lower recovery of HRV A (as seen by lower reads and genome coverage in the two different runs) (Table A6) with no detection of RSV, HMPV, IVA, and HPIV-3. Comparatively, the use of DNase and rRNA with PureLink yielded the detection of all the viral targets and subsequent increase in the viral reads recovery and genome coverage (Table A6).

Table A3
The Ct values detected for each virus in the cocktail using the QIASTAT-Dx-Respiratory SARS-CoV-2 Panel.

DNase treatment vs DNase and rRNA treatment on QIAamp Mini
The use of DNase only with the QIAamp Mini yielded the detection of two of six viral targets in both independent runs. However, the subsequent use of rRNA treatment on the already DNase treated samples resulted in detection of five of six viral targets. It also significantly increased the viral reads and genome coverage of HRV A and SARS-CoV-2 (Table A6).

DNase treatment vs DNase and rRNA treatment on QIAamp Spin
The independent use of DNase only or combined with rRNA did not significantly impact the viral recovery of QIAamp Spin. As shown in Table A6, both methods resulted in similar viral recovery rate detecting and missing either of the targeted viral pathogens in the cocktail.

Method repeatability
To assess the method repeatability, two independent sequence runs were performed. The first run included one DNase treated sample only and one DNase and rRNA treated sample per kit resulting in 6 samples per run. As shown in Table A6, both runs resulted in similar viral recovery rate (in terms of genome coverage) for the targeted viruses and was in part consistent among the extraction kits evaluated. However, QIAamp Spin, exhibited some variations in repeatably, majorly with the viruses at low abundance.

Correlation of viral recovery with Ct values
A correlation was observed with Ct value and viral detection rate across all extraction kits regardless of enrichment methods. As shown in Table A6, HRV A and SARS-CoV-2 with Ct values ≤ 20 were detected by all the extraction kits and both enrichment methods adopted. Conversely, IVA, HMPV and HPIV-3 with Ct values above 30 were recovered with relatively lesser viral reads and genome coverage and mostly in the rRNA treated samples.

Kitome/cross contamination in the negative control
As shown in Table A7, none of the targeted viruses nor other viruses were detected with only bacteriophages detected in the negative controls of all the extraction kits. The result suggested zero cross/kitome contamination in the workflow.

Discussion
The impact of choice of nucleic material extraction method in extraction efficiency has been widely reported with varying, sometimes debatable/conflicting conclusions on which is really the best (Kohl et al., 2015b;Lewandowska et al., 2017;Lewandowski et al., 2017;Sabatier et al., 2020). For diagnostic purposes, especially in clinical settings and virus discovery, independent evaluation of the extraction and enrichment methods may be necessary (Hall et al., 2014). In this study, the higher extraction efficiency exhibited by PureLink compared to QIAamp Mini and QIAamp Spin suggest that extraction efficiency may vary among kits for the same sample; an occurrence which has been previously reported (Klenner et al., 2017). However, while this may be the case, the higher input volume applicable to PureLink and the different lysing chemistry (use of Proteinase K) could have contributed to the increased viral RNA recovery rate. Although the use of carrier RNA could have significantly increased the efficiency of all the kits evaluated, however, considering that NGS routinely reads all the specimen's sequences, carrier RNA would also be sequenced, which could affect the viral sequencing read depth.
Regarding the impact of extraction choice on viral recovery, Klenner and colleagues evaluated the mNGS viral recovery of four manual extraction kits and concluded that the choice of extraction kit does not significantly impact the yield of viral reads obtained by NGS (Klenner et al., 2017). However, the use of cell culture supernatant in their study means the viruses are probably present in high abundance, which is not always the case with clinical samples, especially nasopharyngeal swab  While PureLink performed better than QIAamp Mini and QIAamp Spin, the noted lower genome coverage and few reads numbers for HPIV-3, IVA, and HMPV could be attributed to the high Ct values of these viruses which is correlative of low viral abundance. This is also suggestive of why some of the viruses at lower abundance were not recovered by the other kits evaluated. This phenomenon is not surprising. A similar study by Zhang et al. (2018) reported the detection of only 2 reads for IVA owing to lack of viral abundance (Ct values between 31 and 37). Also, a similar validation-based study reported the lack of read detection for 8/25 viruses targeted. They attributed the occurrence to the low concentration of the missed viruses in the pathogen reagent and further suggested that variations in analytical thresholds and stochastic effects in detecting viruses with extremely low abundance could have a contributory role (Lewandowska et al., 2017). In essence, this phenomenon highlights the importance and necessity of high viral abundance in mNGS (Thorburn et al., 2015).
The use of rRNA after DNase treatment to get rid of the non-pathogen RNA is currently being bypassed by different studies. Many of the reason revolving around additional complexity and cost to the mNGS workflow (Graf et al., 2016;Rosseel et al., 2015). Some studies reported better viral yield and coverage with the use of rRNA treatment on some sample types (Bergner et al., 2019;Manso et al., 2017;Rosseel et al., 2015) while others reported little or no difference when only DNase treatment is adopted (Goya et al., 2018;Graf et al., 2016). In this study, the use of rRNA treatment resulted in increased recovery of the targeted viruses with high genome coverage; especially with the PureLink and QIAamp Mini. However, with the QIAamp Spin the use of DNase only, yielded a highly comparable result to the combined use of DNase and rRNA depletion, as seen by similar viral recovery rate and genome coverage percentage in both enrichment methods and replicates. For studies seeking to characterize known viruses or detect unknown viruses in similar sample matrix such as the one used in this study, the combined use of DNase and rRNA treatment may be a preferred option.
Additionally, this study further exhibited the reproducibility and reliability of the adopted workflow by setting up a duplicate run within one month of the first run. As noted, the viral reads in the duplicate sample one for both DNase only and DNase and rRNA treated samples of PureLink and QIAamp Mini exhibited a degree of consistency. However, the second replicates of QIAamp Spin for viruses at low abundance were not detected regardless of the enrichment method. This phenomenon could suggestively be due to freeze thawing which could have impacted the sensitivity of the kit for the already low abundance pathogens. Furthermore, regarding kitome/cross contamination, none of the targeted viruses nor other known or unknown viruses were detected in the NFW included in the workflow for the three extraction kits. The detection of bacteriophages at varying degree is however not unique to this study as the observation is consistent with a previously reported study (Sabatier et al., 2020), and may be attributable to reagent contamination.
The method established in this study builds upon earlier mNGS techniques in the detection of RNA respiratory viruses, including those present at low abundance, and can play an important role in surveillance, molecular epidemiology, and diagnostics of respiratory RNA viruses. The use of different viruses with different characteristics (including varying genome size) suggests that the methodology established may be versatile in detection of other RNA viruses from NPS samples not included in this study. Additionally, the use of commercially available reagents and non-complex library preparation procedure suggest this method could be easily adapted in a variety of laboratory settings. Taken together, the approaches described here offer an efficient method to harness the power of mNGS in routine laboratory and clinical RNA respiratory virus detection and discovery.

Conclusions
The results in this study suggest that mNGS-based detection of respiratory RNA viruses from clinical nasopharyngeal swab samples may be impacted by nucleic acid extraction choice and the associated enrichment techniques adopted. While the complexity of workflow and cost around the use of rRNA treatment is debatable, this study shows that the adoption of rRNA treatment could result in significantly increased viral detection and genome coverage, especially when used with the Pure-Link. While the QIAamp Spin exhibited the least performance especially with viruses present at low abundance in the cocktail, its viral recovery rate with relatively good genome coverage on samples at high abundance even without the use of rRNA treatment could be explored. Noteworthily, numerous factors may influence the performance of extraction kits and enrichment methods. These factors includes sample matrix (liquid, tissue or solid samples; frozen or fresh), pathogen of interest/abundance and budget. Therefore, individual assessment to determine the best workflow based on these stated factors may be imperative.

Ethics approval statement
also goes to Dr Daniel Morobadi for his assistance in identifying the samples at the NHLS virology laboratory.